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1 Introduction 

Program synthesis is a technique for automatically deriving programs from spec- 
ifications of their behavior. One of the arguments made in favour of program 
synthesis is that it allows one to trace from the specification to the program. 

One way in which traceability information can be derived is to augment 
the program synthesis system so that manipulations and calculations it carries 
out during the synthesis process are annotated with information on what the 
manipulations and calculations were and why they were made. This information 
is then accumulated throughout the synthesis process, at the end of which, every 
artifact produced by the synthesis is annotated with a complete history relating 
it to every other artifact (including the source specification) which influenced its 
construction. This approach requires modification of the entire synthesis system 
— which is labor-intensive and hard to do without influencing its behavior. 

In this paper, we introduce a novel, lightweight technique for deriving trace- 
ability from a program specification to the corresponding synthesized code. 
Once a program has been successfully synthesized from a specification, small 
changes are systematically made to the specification and the effects on the syn- 
thesized program observed. 

We have partially automated the technique and applied it in an experiment 
to one of our program synthesis systems, AuToFilter, and to the GNU C 
compiler, GCC. The results are promising: 

1. Manual inspection of the results indicates that most of the connections 
derived from the source (a specification in the case of AutoFilter, C 
source code in the case of GCC) to its generated target (C source code in 
the case of AutoFilter, assembly language code in the case of GCC) are 
correct. 

2. Around half of the lines in the target can be traced to at least one line of 
the source. 

3. Small changes in the source often induce only small changes in the target. 
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2 Program Generation and Traceability 

Traceability from requirements through to program code provides a rationale 
for the code. There are many reasons why this is desirable, of which some are: 

• It provides an aid to understanding the code. 

• It provides an aid to understanding the requirements. 

• It provides an aid to understanding why the code does or does not work 
correctly. This is particularly important in safety and mission critical 
applications. 

In practice, traceability can be hard to achieve when human programmers 
are involved. Programmers are reluctant to maintain documentation, and trace- 
ability is easily broken if programming artifacts (requirements, design elements, 
documents, code etc) are altered without making corresponding changes to the 
other programming artifacts which they should affect or be affected by. 

Program synthesis is a technique for automatically deriving programs from 
specifications of their behavior. A good specification language enables require- 
ments to be stated in a natural way. Program changes can.be realized entirely 
as changes to the program’s specification. 

The Automated Software Engineering Group at the NASA Ames Research 
Center has been researching and building domain-specific program synthesis 
systems (recently, AutoBayes [2], AutoFilter [5] and before that Amphion 
[3]). Since program synthesis systems are in general large and complex, and 
therefore not necessarily entirely trustworthy, part of our research has addressed 
the synthesis of non-code artifacts which provide evidence that the synthesized 
programs correctly implement their specifications. In particular, the group has 
been developing: 

• extensible automatic certification of synthesized programs [4, 1] — the 
synthesis system synthesizes code annotations along with the program 
code, and these annotations are used to guide a theorem prover to prove 
certain safety properties. 

• automatic documentation of synthesized programs [5] — program docu- 
mentation is synthesized at the same time as the program code. 

Traceability information is another kind of non-code information which pro- 
vides evidence of a program’s fitness for its task. 

In the following sections we outline two techniques by which this trace- 
ability information can be automatically derived. The first technique, which 
we will call in this paper deep traceability , involves augmenting the program 
synthesis system (including program schemas and axioms) so that calculations 
carried out by the synthesis system are annotated with information on what 
the calculations were and why they were made. We concentrate in this paper 
on describing a second techhnique, which we call surface traceability , which is 
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novel and lightweight; once a program has been successfully synthesized from 
a specification, small changes are systematically made to the specification and 
the effects on the synthesized program observed. 

A note regarding notation: we call the input to the program generation 
process the source , and the output the target. In the case of a program synthesis 
system, the source is a specification, and the target is a program (C code, for 
example). In the case of a compiler, the source is a (C) program, and the target 
is an assembly language program. 

3 Deep Traceability 

A technically sound but heaviweight approach to tracing from specifications 
to generated programs involves augmenting the program synthesis system (in- 
cluding program schemas and axioms) so that calculations carried out by the 
synthesis system are annotated with information on what the calculations were 
and why they were made. This approach was adopted in the Explainlt! ex- 
tension of Amphion/NAV [5]. Amphion/NAV is a purely deductive synthesis 
system, which extracts programs from proofs carried out in a tableau style the- 
orem prover. The proofs can be structured into trees whose nodes are sets of 
formulae, and an edge exists links two nodes if the first is derived from the 
second. Explanations are attached to the axioms in Amphion/NAV’s domain 
theory, propagated along the edges in the derivation tree, and finally incorpo- 
rated into an XML document which links each program statement to the axioms 
and parts of the program specification involved in its construction. 

The approach works well for a purely deductive synthesis approach but re- 
quires extensive modification of the entire synthesis system. 

In the rest of this document, we describe a new technique which can trace 
complex relationships between source and target and requires very little effort 
to implement. 

4 Surface Traceability 

We discover, automatically, relationships between source and target in the fol- 
lowing way: first, the synthesis system (or compiler) is applied to the source 
to generate the target. We call the original source the nominal source and the 
corresponding generated target the nominal target. Next, small changes (we call 
them perturbations axe made (one at a time) to the source (yielding a perturbed 
source), and corresponding target programs are synthesized (or compiled) from 
it (resulting in either failure, or in a perturbed target). As long as the syn- 
thesis process is deterministic, differences between the nominal and perturbed 
target programs can only be caused by the differences between the nominal and 
perturbed sources. We therefore associate lines in the nominal target program 
which are changed in a perturbed target program with the lines in the no minal 
source which were changed by the perturbation. An example will demonstrate 
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how the technique works, as well as its flexibility. 

Consider a system which automatically synthesizes English sentences from 
corresponding French specifications. For our current purposes, assume that 
one word of source (or target) is written per line of input (or output). Let 
the nominal source be “Ceci n’est pas une pomme. ” From this we synthesize 
the nominal target, “This is not an apple.” Apply separately the perturbations 
pomme — )■ banane , pas pipe , and une la , resulting in “This is not a banana.” 
for the first perturbation, an error for the second, and “This is not the apple.” for 
the third perturbation. We associate the differences between the perturbed and 
nominal targets with the corresponding perturbations, in this case we associate 
“apple” with “pomme” and “an” with “une”. 

The main advantages of the proposed technique are all closely related: 

1. It is very lightweight: it is extremely simple to implement, and quite 
effective. In our implementation (§5), perturbations are applied by a line 
editor, and differences are determined by the Unix dif f program. 

2. It does not require modification of the synthesis system. This greatly 
reduces the effort needed to employ it, and removes the possibility of 
inadvertently introducing errors into the synthesis system when it is mod- 
ified. 

3. It does not require detailed, or indeed any knowledge of the internal mech- 
anisms of the synthesis system. 

There are of course disadvantages, which we note here: 

1. It cannot identify every part of the source which influences the target. 

2. Some small changes in the specification can appear to have profound ef- 
fects when in fact the synthesised programs axe equivalent. For example, 
a variable name which occurs in many lines of the program might be 
changed. Note that this effect would also appear in a deep traceability 
approach unless measures were taken to overcome it, for example by de- 
veloping a notion of a-equivalence (in the sense of the A calculus) for the 
generated programs. 

3. Some changes cannot be made without also making other corresponding 
changes. For example, to discover the effect on the target of the name of 
a function which is declared in the source, all lines in the source which 
contain that function name have to be changed simultaneously, or an error 
will result. We therefore cannot discover the effect of function naming with 
only single-line changes to the nominal specification. 

4. There is a single manual part of the process: choosing which perturbations 
to apply. We discuss this problem below §7. 
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5 Implementation 

Initially, we carried out changes entirely by hand- This indicated that the 
technique might be interesting, so we decided to automate the process. The 
system is used as follows: 

• A list of perturbations is given to the system, expressed as commands (the 
perturbation ed commands ) for the Unix ed editor. Note that currently 
each perturbation may only alter a single line in the source. 

• For each perturbation, a shell script applies the following steps: 

- The perturbation is applied to the nominal specification to obtain a 
perturbed specification . 

- The synthesis system is applied to the perturbed specification, either 
failing, or yielding a perturbed program. 

- If synthesis failed, this is noted in the log file, otherwise the differ- 
ences between the perturbed program and the nominal program axe 
computed (using Unix diff -w) and appended to the log file. 

• Some irrelevant information is removed from the log file (leaving for each 
change the specification line changed followed by the ed commands pro- 
duced by diff which describe the difference (if any) between the cor- 
resonding perturbed and nominal programs). 

• A number of emacs macros are used to: 

- Remove differences which only add lines to the nominal program — 
we exclude these since we are going to annotate the nominal program 
with the changes and in this case the lines which are added do not 
exist in the nominal program, only in the perturbed program. 

- Move perturbations which produced no effect (or only changed a date 
stamp in the generated target) into a separate file. 

— For each remaining difference, derive an ed command which will ap- 
pend the perturbation ed commands to the lines in which program 
which they affect. 

• These derived ed commands are finally applied to the nominal program, 
yielding the annotated program , in which each line may be annotated with 
one or more perturbation ed commands, corresponding to the pertur- 
bations which affected that line in the program (as judged by that line 
differing in the perturbed program from the nominal program). 


6 Results 

In this section we describe the results of applying our technique in two contexts: 
the AutoFilter program synthesis system, and the GNU GGC compiler. We 
have not yet tried to formally evaluate the technique. 
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6.1 AutoFilter 




Initially, the technique was manually applied to an AutoFilter specification (a 
simplified specification of part of the Deep Space 1 probe's attitude control sys- 
tem). The specification has 134 lines (of which 44 are non-blank, non-comment 
lines). The nominal program has 362 fines (of which 235 are non-blank, non- 
comment lines). A total of 37 perturbations were manually applied. 9 led to 
failed synthesis attempts, 9 did not lead to any changes in the synthesized pro- 
gram, 6 changed only temporary variable names in the generated code (the 
programs were a- equivalent), and 10 reveal interesting relationships between 
the source and target. 

In the first semiautomated experiment, using the same specification, 67 per- 
turbations were chosen: 18 led to failed synthesis attempts, 19 did not lead 
to any change in the synthesized program. The remaining 30 generated an- 
notations of the synthesized program. Of these 30, 6 changed many lines in 
the target, changing the number or order of inputs variables to the synthesized 
code, or the size of its internal matrices and vectors. In total, 143 non-blank, 
non-comment fines in the generated code were annotated. 

In the second semiautomated experiment, applied to an AutoFilter speci- 
fication for thruster control during automated docking (source: 143 non-empty, 
non-comment fines; output: 220 non-blank, non-comment fines), 43 perturba- 
tions were applied. 16 led to failed synthesis attempts, 6 did not lead to any 
changes in the synthesized program, 9 led to localized changes, 9 led only to 
temporary variable name changes, and 3 changed many fines in the target. 

Manual inspection of the annotated target programs produced in the two 
semiautomated experiments suggest that most of the fines in the synthesized 
program can be traced back to one or more fines in the specification, that the 
relationships identified between specification and synthesis program are correct, 
as judged by someone who understand the meanings of the specifications and 
the synthesized programs. 

We now present a more detailed example. 

6.2 GCC 

In order to demonstrate the flexibility of our technique for surface traceability, 
we applied it to the generation of assembly language code from C source code. 
Figure 1 shows the source program, and figure 2 shows the annotated assembly 
language program which was generated. In order to fit space requirements the 
information has been manually edited: only the main section of the generated 
assembly language code is shown, each perturbation has been written as the 
source code fine number to which it applied and a letter, fisted at the begin- 
ning of the assembler fine which it traced. The perturbations have been shown 
directly in the source program fisting. Only perturbations which traced lines 
in the main section of assembler code in figure 2 are shown. Others either had 
no effect, caused an error, or affected a part of the assembler code outside the 
main section. 
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The resulting annotated assembly code identifies many relationships between 
it and the C source code. Here is our interpretation of some of the results: first, 
note that most perturbations only affect a small number of lines in the generated 
assembler. The exceptions to this are perturbations 8A and 10E which change 
many lines of the generated assembler code (probably because in changing the 
datatypes which represent i and a they affect register allocation and memory 
offsets although we can’t conclude this from our experiments — to draw this 
conclusion probably requires some knowledge of assembler and the amount of 
memory needed to store ints versus doubles versus floats). Perturbation 27H 
also results in a significant change, possibly for a similar reason. Perturbations 
16B, 16C, 16R identify those parts of the target associated with the head of 
the for loop. Perturbations 34L and 37M trace the call to the exp function. 
Perturbation 30P traces the assignment of the result to y. Perturbation 40Q 
traces where y is printed. Perturbation T6R traces that add instruction to the 
loop header. Other relationships between source and target are made evident 
by our experiment: readers are invited to determine these themselves. 

7 Conclusions 

The technique we have presented, though extremely simple, has the power to 
discover relationships between source and target which would otherwise require 
detailed knowledge of source and target languages, the meaning of the source 
and target programs, or the program generation (compilation) mechanism itself. 
We successfully leverage the automation of the code generation (or compilation) 
process, which is an essential component of the technique. 

The results are promising: 

1. Manual inspection of the results indicates that most of the connections 
derived from the source (a specification in the case of Auto Filter, C 
source code in the case of GCC) to its generated target (C source code in 
the case of AutoFilter, assembly language code in the case of GCC) are 
correct. 

2. Around half of the lines in the target can be traced to at least one line of 
the source. 

3. Small changes in the source often (especially in the GCC example) induce 
only small changes in the target. 

In the AutoFilter example, many of the lines in the synthesised program 
were marked as changed merely because a variable name had changed. This 
suggests that a more sophistocated way of determining differences, which takes 
account of unimportant changes in variable names (i.e. a-equivalence) would be 
useful. Similarly, in the GCC example, some changes affected a large number of 
lines in the generated assembler code because they changed memory addresses 
or register allocation. 
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#include <stdio . h> 
#include <math . h> 


8A int 


double 


int raainO 

{ 

double t, tf, 
x, y; 
int i; 


10E double float 

double a = 60, 
b = 0.0782, 
kappa = 1.95, 

13G 0.5 — > 1 

c = 0.5; 

14H t -4 t+1 

tf = 1. 0/5.0; 

16B 0 — y 1 16C < — y > 16R ++ — ^ — 

for(i=0; i < 100; 
{ 

18D tf — y t 

t = i * tf; 

/* 

y = a*exp(-b) . . . 
x = c*kappa*a*. . . 
*/ 

25F c — y kappa 

x = c* 
kappa* 

27H t -y t+1 

( ( 1-pov (kappa , t ) ) / 
(1- 

291 kappa — y c 

kappa) ) ; 

30P y -y x 

y = a* 
exp( 

32J b b-1 

-b)* 

33N 1 -> 2 

(Cl- 

34L exp ~y log 

exp( 

35K b — y kappa 

-b* 

t))/ 

37M exp — y log 

( 1-exp ( 
-b) ) ) ; 

40Q x — y kappa 

print f ("*/f */.f ", x , 
}}} 


x, y); 


Figure 1: The C source code, and a list of the perturbations which applied to 
the section of assembly language code in figure 2. Note that since our technique 
traces target lines of code to source lines of code, we have split compound 
statements into multiple lines. 
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8 A 16B 
8 A 

8 A 16C 

8 A 

18D 

10E 

2TH 

10E 25F 
10E 

8 A 13 G 

8 A 10E 13G 
10E 

10E 291 
10E 

10E 32J 
10E 27H 32J 


10E 

10E 35K 
10E 

10E 27H 32J 

34L 

10E 

10E 27H 32J 

37M 

10E 

10E 27H 32J 
8A-13G 33N 

10E 

8 A 13G 


30P 

8 A 10E 13G 33N 
40Q 

8 A 

16R 8A 
8 A 


st ‘/.gO, Rfp-52] 

. LL3 : Id [y.fp-52], ’/.oO 

cmp 7,o0, 99; ble .LL6 
nop; b .LL4; nop 

.1X6 : Id [y.fp~52], ‘/.f4; fitod %U y If2 

ldd [*/fp-32], 7,f 4; fmuld ‘/.f2, y,f4, ‘/.f2 
std If 2, [y.fp-24] 
idd [y.fp-8o], y.oo 
ldd [y.fp- 24 ], y.o2 

call pow, 0; nop; fmovs IfO, */,f4; fmovs %fl, %f5 

ldd [y.fp-88], '/.f 2 

ldd [y.fp-80], */,f 6 ; fmuld If 2, If 6, ‘/.f2 

sethi y,hi(.LLC5), Xol; or I ol, ‘/,lo(.LLC5), ‘/.oO 

ldd RoO], y.f6; fsubd %f6, %f4, If 4 

sethi y,hiC.LLC5), lol; or */ol, %lo(.LLC5), loO 

ldd RoO], If 6 

ldd Rfp-80] , 8 

fsubd Xf6, %f 8, y,f6; fdivd Xf4, If 6, If 4 
fmuld Xf2, y,f4’. If 2; std y,f2, [%fp-40] 
ldd [ # /.fp-72]. If 2 

fnegs y.f2, 'Af 4 ; fmovs ‘/.f3, ’/,f5; std %f4, Rfp-16] 
ldd [Xf p-16] , '/.o2; mov */,o2, '/,oO; mov */ f o3, '/.ol; 
call exp, 0; nop 

std y.fO, [y.fp-96] 
ldd [y.fp-72]. If 4 

fnegs 7.f4, If 2; fmovs %f5, %f 3 ; ldd Rfp-24] , If 4 

fmuld If 2, If 4, y.f6; std %f6, [‘/.fp-16] 

ldd [*/,fp-16] , y,o2; mov */,o2, */,o0; mov yo3, lol 

call exp , 0 

nop 

std 7,f0, [*/.f p-104]; ldd [y.fp-72], %f2 
fnegs If 2, If 8; fmovs y,f3, */,f9; std If 8, [%fp-16] 
ldd [*/f p-16] , '/*o2 ; mov */,o2 , */,o0 ; mov */ f o3 , */,ol 
call exp, 0 

nop; fmovs */,f0, If 2; fmovs */,fl, '/.f3 
ldd Rfp-64], Xf6 

ldd Rfp-96], y,flO; fmuld ‘/.flO, Xf6, 7.f4 

sethi y.hi(.LLCS) , lol; or ‘/.ol, %lo(.LLC5), 7.o0 

ldd LloOl, If 8 

ldd [y.fp-104] , If 10 

fsubd y.f8, y.flO, ‘/.f6 

sethi */*hi ( . LLC5) , lol; or ‘/.ol, ‘/.lo(.IXC5), loO 
ldd [y.oO], yf 8; fsubd '/.f8, */.f2, */.f2; fdi.vd y,f6, ‘/f2, */f6; 
fmuld */.f4, */f 6, If 2 
std */,f2, [y.fp-48] 

sethi # /.hi(.IXC6), lol; or ‘/.ol, y.lo(.LLC6), loO 
Id [y.fp-40], lol; Id [y.fp-36], lo2 

Id [*/f p-48] , */o3 ; Id [*/,f p-44] , */,o4 ; call print f , 0 ; nop 
. LL5 : Id [y.fp-52], */,o0 

add y.oO, 1, */,ol 
st lol, [b .1X3 


Figure 2: The annotatecPassembly language code. 


The perturbations to be applied to the source were chosen manually. They 
were not systematic: we probably did not find all the ways the source can affect 
the target, perturbations were probably to some extent targeted (we know what 
changes make sense and so might lead to meaningful changes in the program) . 
A more systematic methodology would make changes at random in the source. 
However, we would expect the vast majority of synthesis/compilation attempts 
on randomly perturbed sources to fail. We could get around this problem by em- 
ploying a grammar for the source (specification) language to randomly generate 
perturbations which created grammatically correct perturbed sources. 

The technique now needs to be evaluated more thoroughly: we need to apply 
it to more and different kinds of source and synthesis /compilation systems. To 
ease automation, only single line changes were made: this is limiting because 
some changes only make sense when made in conjunction with other changes. 
The annotated target produced by the system is adequate for experiments but a 
better form of output would be useful, for example lines in the annotated target 
could be linked using HTML to the lines in the source which affected them or 
presented in the form of a traceability matrix. 
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